AITopics | polyak stepsize

Parameter-free Clipped Gradient Descent Meets Polyak Y uki T akezawa

Neural Information Processing SystemsFeb-12-2026, 21:47:00 GMT

Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparame-ters, we need to tune the hyperparameters carefully using a grid search.

artificial intelligence, gradient descent, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.97)

Add feedback

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

Neural Information Processing SystemsFeb-11-2026, 21:48:53 GMT

The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for SGD have shown remarkable effectiveness when training over-parameterized models. However, two issues remain unsolved in this line of work.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > Virginia (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(5 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ac662d74829e4407ce1d126477f4a03a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 07:31:56 GMT

convergence, decsps, stepsize, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

Neural Information Processing SystemsDec-25-2025, 07:12:20 GMT

The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for SGD have shown remarkable effectiveness when training over-parameterized models. However, two issues remain unsolved in this line of work. First, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in a worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al.), this approach results in slower convergence rates under interpolation. Second, intuitive line-search methods equipped with variance-reduction (VR) fail to converge (Dubois-Taine et al.).

polyak stepsize, polyak stepsize and line-search, robust convergence and variance reduction, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.62)

Add feedback

Parameter-free Clipped Gradient Descent Meets Polyak Y uki T akezawa

Neural Information Processing SystemsOct-10-2025, 02:00:52 GMT

Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparame-ters, we need to tune the hyperparameters carefully using a grid search.

artificial intelligence, gradient descent, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.97)

Add feedback

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

Neural Information Processing SystemsOct-9-2025, 18:20:54 GMT

We give exact expressions for the risk and learning rate curves in terms of a deterministic solution to a system of ODEs.

adagrad-norm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

540eb9e0ee35d525231c3fd22d1dcbf2-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 17:02:52 GMT

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > Virginia (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(6 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Preconditioned subgradient method for composite optimization: overparameterization and fast convergence

Díaz, Mateo, Jiang, Liwei, Labassi, Abdel Ghani

arXiv.org Artificial IntelligenceOct-6-2025

Composite optimization problems involve minimizing the composition of a smooth map with a convex function. Such objectives arise in numerous data science and signal processing applications, including phase retrieval, blind deconvolution, and collaborative filtering. The subgradient method achieves local linear convergence when the composite loss is well-conditioned. However, if the smooth map is, in a certain sense, ill-conditioned or overparameterized, the subgradient method exhibits much slower sublinear convergence even when the convex function is well-conditioned. To overcome this limitation, we introduce a Levenberg-Morrison-Marquardt subgradient method that converges linearly under mild regularity conditions at a rate determined solely by the convex function. Further, we demonstrate that these regularity conditions hold for several problems of practical interest, including square-variable formulations, matrix sensing, and tensor factorization. Numerical experiments illustrate the benefits of our method.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

2509.11486

Country:

Africa > Senegal > Kolda Region > Kolda (0.04)
North America > United States > New York (0.04)
North America > United States > Maryland > Baltimore (0.04)
(8 more...)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

ac662d74829e4407ce1d126477f4a03a-Paper-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 14:42:02 GMT

artificial intelligence, decsps, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > New York (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

Neural Information Processing SystemsMay-26-2025, 22:33:09 GMT

The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for SGD have shown remarkable effectiveness when training over-parameterized models. However, two issues remain unsolved in this line of work. First, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in a worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al.), this approach results in slower convergence rates under interpolation. Second, intuitive line-search methods equipped with variance-reduction (VR) fail to converge (Dubois-Taine et al.).

polyak stepsize, polyak stepsize and line-search, robust convergence and variance reduction, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.65)

Add feedback

Collaborating Authors

polyak stepsize

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Parameter-free Clipped Gradient Descent Meets Polyak Y uki T akezawa

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

ac662d74829e4407ce1d126477f4a03a-Paper-Conference.pdf

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

Parameter-free Clipped Gradient Descent Meets Polyak Y uki T akezawa

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

540eb9e0ee35d525231c3fd22d1dcbf2-Paper-Conference.pdf

Preconditioned subgradient method for composite optimization: overparameterization and fast convergence

ac662d74829e4407ce1d126477f4a03a-Paper-Conference.pdf

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction